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Abstract 

Based on the methodological similarity between sparse signal reconstruction and 
system identification, a new approach for sparse signal reconstruction in compressive 
sensing (CS) is proposed in this paper. This approach employs a stochastic gradient- 
based adaptive filtering framework, which is commonly used in system identification, to 
solve the sparse signal reconstruction problem. Two typical algorithms for this problem: 
/o-least mean square (Zq-LMS) algorithm and Zo-exponentially forgetting window LMS 
(Zq-EFWLMS) algorithm are hence introduced here. Both the algorithms utilize a zero 
attraction method, which has been implemented by minimizing a continuous approxi- 
mation of Iq norm of the studied signal. To improve the performances of these proposed 
algorithms, an Zo-zero attraction projection (Zg-ZAP) algorithm is also adopted, which 
has effectively accelerated their convergence rates, making them much faster than the 
other existing algorithms for this problem. Advantages of the proposed approach, such 
as its robustness against noise etc., are demonstrated by numerical experiments. Key- 
words: adaptive filter, compressive sensing (CS), least mean square (LMS), sparse 
signal reconstruction, Iq norm, stochastic gradient. 



1 Introduction 

1.1 Overview of Compressive Sampling 

Compressive sensing or compressive sampling (CS) [IHl] is a novel technique that enables 
sampling below Nyquist rate, without (or with little) sacrificing reconstruction quality. It is 
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based on exploiting signal sparsity in some typical domains. A brief review on CS is given 
here. 

For a piece of finite-length, real- valued 1-D discrete signal x, its representation in domain 
* is 

N 

X = J]] ViSi = *s, (1) 

1=1 

where x and s are x 1 column vectors, and ^' is an x basis matrix with vectors 
= 1,2,...,A^) as columns. Obviously, x and s are equivalent representations of the 
signal when ^ is full ranked. Signal x is if-sparse if K out of A^ coefficients of s are nonzero 
in the domain ^. And it is sparse if AT ^ A^. 

Take M (K < M <^ N) linear, non-adaptive measurement of x through a linear trans- 
form which is 

y = $x = = As, (2) 

where $ is an M x A^ matrix, and each of its M rows can be considered as a basis vector, 
usually orthogonal, x is thus transformed, or down sampled, to an M x 1 vector y. 
According to the discussion above, the main task of CS is 

• To design a stable measurement matrix. It is important to make a sensing matrix 
which allows recovery of as many entries of x as possible with as few as M mea- 
surements. The matrix A should satisfy the conditions of Incoherence and restricted 
isometry property (RIP) |3j. Fortunately, simple choice of $ as the random matrix 
can make A satisfy these conditions with high possibility. Common design methods 
include Gaussian measurements, Binary measurements, Fourier measurements, and 
Incoherent measurement 0. The Gaussian measurements are employed in this work, 
i.e., the entries of M x A^ sensing matrix $ are independently sampled from a normal 
distribution with mean zero and variance 1/M (A/'(0, 1/M)). When the basis matrix 
^ (wavelet, Fourier, discrete cosine transform (DCT), etc) is orthogonal, A is also 
independent and identically-distributed (i.i.d.) with A/'(0, 1/M) [1]. 

• To design a signal reconstruction algorithm. The signal reconstruction algorithm aims 
to find the sparsest solution to ([2]), which is ill-conditioned. This will be discussed in 
detail in the following subsection. 

Compressive Sensing methods provide a robust framework that can reduce the number 
of measurements required to estimate a sparse signal. For this reason, CS methods are 
useful in many areas, such as MR imaging [5] and analog-to-digital conversion [6]. 

1.2 Signal Reconstruction Algorithms 

Although CS is a new concept emerged recently, searching for the sparse solution to an 
under-determined system of linear equations ([2]) has always been of significant importance 
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in signal processing and statistics. The main idea is to obtain the sparse solution by adding 
sparse constraint. The sparsest solution can be acquired by taking Iq norm into account, 

min||s||o, s.t. As = y. (3) 

s 

Unfortunately, this criterion is not convex, and the computational complexity of optimizing 
it is Non-Polynomial (NP) hard. To overcome this difficulty, Iq norm has to be replaced 
by simpler ones in terms of computational complexity. For example, the convex li norm is 
used, 

min||s||i, s.t. As = y. (4) 

s 

This idea is known as basis pursuit, and it can be recasted as a linear programming (LP) is- 
sue. A recent body of related research shows that perhaps there are conditions guaranteeing 
a formal equivalence between the Iq norm solution and the li norm solution [1]. 

In the presence of noise and/or imperfect data, however, it is undesirable to fit the 
linear system exactly. Instead, the constraint in ^ is relaxed to obtain the Basis Pursuit 
De-Noise (BPDN) problem, 

min||s||i, s.t. ||y — As||2 < cr, (5) 

s 

where the positive parameter a is an estimation of the noise level in the data. The convex 
optimization problem ([5]) is one possible statement of the least-squares problem regularized 
by the li norm. In fact, the BPDN label is typically applied to the penalized least-squares 
problem, 

min ||y — Asjlo + A||s||i, (6) 

s 

which is proposed by Chen et al. in [7], [8]. The third formulation, 

min ||y — Asjll s.t. ||s||i < r, (7) 

s 

which has an explicit li norm constraint, is often called the Least Absolute Shrinkage and 
Selection Operator (LASSO) [9]. The problems ([5]), ([6]) and ([7]) are identical in some 
situations. The precise relationship among them is discussed in [10], [TT] . 

Many approaches and their variants to these problems have been described by the lit- 
erature. They mainly fall into two basic categories. 

Convex relaxation: The first kind of convex optimization methods to solve problems 
([5]), dH]) and ^ includes interior-point (IP) methods [12], [13], which transfer these prob- 
lems to a convex quadratic problem. The standard IP methods cannot handle large scale 
situation. However, many improved IP methods, which exploit fast algorithms for the ma- 
trix vector operations with A and A""", can deal with large scale situation, as demonstrated 
in [7], [H]. High-quality implementations of such IP methods include 11-magic [15] and 
PDCO [H], which use iterative algorithms, such as the conjugate gradients (CG) or LSQR 
algorithm [T7], to compute the search step. The fastest IP method has been recently pro- 
posed to solve dH), different from the method used in the previous works. In such method 
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called llJs, the search operation in each step is done using the Preconditioned Conjugate 
Gradient (PCG) algorithm, which requires less computation, i.e., only the products of A 
and A"^ [IS]. 

The second kind of convex optimization methods to solve problems ([5]), ([6]) and ([7]) 
includes homotopy method and its variants. Homotopy method is employed to find the full 
path of solutions for all nonnegative values of the scalar parameters in the above said three 
problems. When solution is extremely sparse, the methods described in |19H21j can be very 
fast [22]. Otherwise, the path- following methods are slow, which is often the case for large 
scale problems. Other recent developed computational methods include coordinate-wise 
descent methods [23], fixed-point continuation method [23], sequential subspace optimiza- 
tion methods [2^, bound optimization methods [27], iterated shrinkage methods [2H], gra- 
dient methods [29], gradient projection for sparse reconstruction algorithm (GPSR) 
sparse reconstruction by separable approximation (SpaRSA) [25] and Bregman iterative 
method [30l[31]. Some of these methods, such as the GPSR, SpaRSA and Bregman itera- 
tive method, can efficiently handle large-scale problems. 

Besides norm, another typical function to represent sparsity is Ip norm (0 < p < 
1). The problem is a non-convex one, thus it is often transferred to a solvable convex 
problem. Typical methods include FOCal Under-determined System Solver (FOCUSS) [32] 
and Iteratively Reweighted Least Square (IRLS) [33], [M]- Compared with the li norm 
based methods, these methods always need more computational time. 

Greedy pursuits: Rather than minimize an objective function globally, these methods 
make a local optimal choice after building up an approximation at each step. Matching Pur- 
suit (MP) and Orthogonal Matching Pursuit (OMP) |351I36| are two of the earliest greedy 
pursuit methods, then came Stagewise OMP (StOMP) p7] and Regularized OMP [38] 
as their improved versions. The reconstruction complexity of these algorithms is around 
0{KMN), which is significantly lower than BP methods. However, they require more 
measurements for perfect reconstruction and may fail to find the sparsest solution in cer- 
tain scenarios where h minimization succeeds. More recently, Subspace Pursuit (SP) [39], 
Compressive Sampling Matching Pursuit (CoSaMP) [40j and Iterative Hard Thresholding 
method (IHT) |41j have been proposed by incorporating the idea of backtracking. Theo- 
retically they offer comparable reconstruction quality and low reconstruction complexity as 
that of LP methods. However, all of them assume that the sparsity parameter K is known, 
whereas K may not be available in many practical applications. In addition, all greedy 
algorithms are more demanding in memory requirement. 

1.3 Our Work 

The convex optimization methods, such as 11 Js and SpaRSA, take all the data of A into 
account for each iteration, while the greedy pursuits consider each column of A for iterations. 
In this paper, the adaptive filtering framework, which uses each row of A for each iteration. 
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is applied for signal reconstruction. Moreover, instead of Zi norm, we take one of the 
approximations of Iq norm, which is widely used in recent contribution [l2], as the sparse 
constraint. The authors of [12] give several effective approximations of Iq norm for Magnetic 
Resonance Image (MRI) reconstruction. However, their solver of this problem adopts the 
traditional fix-point method, which needs much more computational time. Thus it is hard 
to implement for the large scale problem, with which our approach can effectively deal. 

According to our best knowledge, it is the first time that the adaptive filtering framework 
is employed to solve CS reconstruction problem. In our approach, two modified stochastic 
gradient-based adaptive filtering methods are introduced for signal reconstruction purpose, 
and a novel and improved reconstruction algorithm is proposed in the end. 

As the adaptive filtering framework can be used to solve under-determined equation, 
it can be readily accepted that CS reconstruction problem can be seen as a problem of 
sparse system identification by making some correspondence. Thus, a variant of Least 
Mean Square (LMS) algorithm, /q-LMS, which imposes a zero attractor on standard LMS 
algorithm and has good performance in sparse system identification, is introduced to CS 
signal reconstruction. In order to get better performance, an algorithm /o-Exponentially 
Forgetting Window LMS (Iq-EFWLMS) is also adopted. The convergence of the above two 
methods may be slow since I2 norm and Iq norm need to be balanced in their cost functions. 
As regard to faster convergence, a new method named lo-Zeio Attraction Projection (Iq- 
ZAP) with little sacrifice in accuracy is further proposed. Simulations show that Zq-LMS, 
Zq-EFWLMS and /q-ZAP have better performances in solving CS problem than the other 
typical algorithms. 

The remainder of this paper is organized as follows. In Section II, the adaptive filtering 
framework is reviewed and the methodological similarity between sparse system identifica- 
tion and CS problem is demonstrated. Then Zq-LMS, Zq-EFWLMS and Iq-ZAP are intro- 
duced. The convergence performance of Zq-LMS is analyzed in Section III. In Section IV, 
five experiments demonstrate the performances of the three methods in various aspects. 
Finally, our conclusion is made in Section V. 

2 Our Algorithms 

2.1 Adaptive filtering framework to solve CS problem 

Adaptive filtering algorithms have been widely used nowadays when the exact nature of 
a system is unknown or its characteristics are time-varying. The estimation error of the 
adaptive filter output with respect to the desired signal d{n) is denoted by 

e(n) = d{n) — x'^(n)w(n), (8) 

where w(n) = [wo{n), wi{n), . . . , WL-i{n)]^ and x(n) = [x{n),x{n — 1), ... , x{n — L + 1)]^ 
denote the filter coefficient vector and input vector, respectively, n is the time instant, and 
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Table 1: The correspondences between the variables in adaptive filter and those in CS 
problem. 

adaptive filter CS problem 
x(n) afc 
w(n) s 

d{n) Vk 

L is the filter length. By minimizing the cost function, the parameters of the unknown 
system can be identified iteratively. 

Recalling the CS problem, one of its requirements is to solve the under-determined 



equations y = As. Suppose that 

A = [a^ aJ,...,aT]'^; (9) 

a/c = [aki,ak2, ■■■ ,akN], k = 1,2, ... ,M; (10) 

S = [si,S2,. . . ,Sn]^] (11) 

y = [2/1,2/2, ••• (12) 



CS reconstruction problem can be regarded as an adaptive system identification problem by 
the correspondences listed in TABLE [TJ Thus equation ([2]) can be solved in the framework 
of adaptive filter. 

When the above adaptive filtering framework is applied to solve CS problem, there may 
not be enough data to train the filter coefficients into convergence. Thus, the rows of A 
and the corresponding elements of y are utilized recursively. The procedures using adaptive 
filtering framework are illustrated in Fig{T] Suppose that s(n) is the updating vector, the 
detailed update procedures are as follows. 

1. Initialize re = 1, s(0) = 0. 

2. Send data a^ and to adaptive filter, where 

k= mod(n,M) + l. (13) 

3. Use adaptive algorithm to update s(re). 

4. Judge whether stop condition is satisfied, 

||s(n) - s(re - 1)||2 < e or n>C, (14) 
where e > is a given error tolerance and C is a given maximum iteration number. 

5. When satisfied, send s(re) back to s and exit; otherwise re increases by one and go 
back to 2). 
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Figure 1: The framework of adaptive filter to solve CS reconstruction problem. 



Adaptive filtering methods are well-known while CS is a popular topic in recent years, so 
it is surprising that no literature employs adaptive filtering structure in CS reconstruction 
problem. The reason might be that the aim of CS is to reconstruct a sparse signal while 
the solutions to general adaptive filtering algorithms are not sparse. In fact, several LMS 
variations |43H45j . with some sparse constraints added in their cost functions, exist in sparse 
system identification. Thus, these methods can be applied to solve CS problem. 

In following subsections, /q-LMS algorithm and the idea of zero attraction will be firstly 
introduced. Then /q-EFWLMS, which imposes zero attraction on EFW-LMS, is introduced 
for better performance. Finally, to speed up the convergence of the two new methods, a 
novel algorithm Zq-ZAP, which adopts zero attraction in solution space, is further proposed. 

2.2 Based on /q-LMS algorithm 

LMS is the most attractive one in all adaptive filtering algorithms because of its simplicity, 
robustness and low computation cost. In traditional LMS the cost function is defined as 
squared error, 

eLMs(n) = |e(n)p. (15) 
Consequently, the gradient descent recursion of the filter coefficient vector is 

w(n + 1) = w(n) + /ie(n)x(n), (16) 

where positive parameter fj. is called step-size. 
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In order to improve the convergence performance when the unknown parameters are 
sparse, a new algorithm /q-LMS }43] is proposed by introducing a Iq norm penalty to the 
cost function. The new cost function is defined as 

WW = |e(n)|2 + 7|lw(n)||o, (17) 

where 7 > is a factor to balance the new penalty and the estimation error. Considering 
that Iq norm minimization is an NP hard problem, Iq norm is generally approximated by a 
continuous function. A popular approximation [46| is 

L-l 

||w(n)||o« j;(l-e-"l-»Wl), (18) 

4 = 

where the two sides of (llSp are strictly equal when parameter a approaches infinity. Ac- 
cording to (fT8]l . the proposed cost function can be rewritten as 

L-l 



6o-LMs(n) = |e(n)|2 + 7^ (l-e-"l«"Wlj . (19) 

i=0 

By minimizing ()19p . the new gradient descent recursion of filter coefficients is 

Wi{n + 1) = Wi{n) + fie{n)x{n - i) - Kasgn{wi{n))e-'^^'"'^''^^ , VO < i < L, (20) 
where k = /X7 and sgn(-) is a component-wise sign function defined as 

— 2; 7^ 0' 

sgn(x) = { m (21) 
elsewhere. 



To reduce the computational complexity of (j20p , especially that caused by the last term, 
the first order Taylor series expansion of exponential functions is taken into consideration, 



1 — a\x\ \x\ < ^; 



1 

(22) 



I elsewhere. 

Note that the approximation of (j22p is bound to be positive because the value of exponential 
function is larger than zero. Thus the final gradient descent recursion of filter coefficient 
vector is 

w(n + 1) = w(n) + fie{n)x{n) + Kg(w(n)), (23) 



where 



and 



g(w(n)) = [g{wo{n)),g{wi{n)), . . . , g{wL-i{n))f (24) 



oP'x + a — ^ < X < 0; 



a^x - a < X < (25) 



a' 

elsewhere. 
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Method 1. ^o-LMS method for CS 

1: Initiahze s(0) = 0, n=l, choose iJ,,a,K; 

2: while stop condition (|14|) is not satisfied; 

3: Determine the input vector x(n) and desired signal d(n) 

k = mod(n,M) + l; 

x(n) = afc; 

d{n) = Vk; 
4: Calculate error e(n) 

e(n) = d{n) — x'^(n)s(n); 
5: Update s(n) using LMS 

s(n) = s(n — 1) + ^e(n)x(n); 
6: Impose a zero attraction 

s(n) = s(n) + Kg (s(n - 1)); 
7: Iteration number increases by one 

n = n + 1; 

8: End while. 

The last term of (I23p is called zero attraction term, which imposes an attraction to 
zero on small coefficients. Since zero coefficients are the majority in sparse systems, the 
convergence acceleration of zero coefficients will improve identification performance. In CS, 
the zero attraction term will ensure the sparsity of the solution. 

By utilizing the correspondence in TABLE [H the final solution to CS problem can be 
obtained, which is summarize as Method 1. 

2.3 Based on /q-EFWLMS algorithm 

Recursive Least Square (RLS) is another popular adaptive filtering algorithm [17]i PS] - 
whose cost function is defined as the weighted sum of continuous squared error sequence, 

n 

eRLs(n) = J]A"-VWP. (26) 

4 = 1 

where <C A < 1 is called forgetting factor and 

e(i) = -xT(i)w(n). (27) 

The RLS algorithm is difficult to implement in CS because it costs a lot of computing 
resources. However, motivated by RLS, the approximation of its cost function with shorter 
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sliding-window is considered, which suggests a new penalty 



i=n~Q+l 



(28) 



where Q is the length of the sliding-window. The algorithm, which minimizes (j28p . is called 
Exponentially Forgetting Window LMS (EFW-LMS) |39]. The gradient descent recursion 
of the filter coefficient vector is 



where 



and 



w(n + 1) = w(n) + /iX(n)Ae'(n) 



X(n) = [x(n - Q + 1), x(n - Q + 2), . . . , x(n) 

A^-i ... " 
... 

... 1 ^ 

e'(n) = [e{n - Q + 1), e(n - Q + 2), . . . , e{n)f 
= d'(n) — X'^(n)w(n), 

d'(n) = - Q + 1), d(n - Q + 2), . . . , , 



(29) 

(30) 
(31) 



(32) 



(33) 



In order to obtain sparse solutions in CS problem, zero attraction is employed again. 
Thereby the final gradient descent recursion of the filter coefficient vector is 



w(n + 1) = w(n) + /iX(n)Ae'(n) + Kg(w(n)). 



(34) 



This algorithm is denoted as Zq-EFWLMS. 

The method to solve CS problem utilizing the correspondence in TABLE [1] based on 
Zq-EFWLMS is summarized in Method 2. 



2.4 Based on /q-ZAP algorithm 

The two methods described above /q-LMS and /q-EFWLMS can be considered as solutions to 
h — lo problem. Observing (j23|) and it is obvious that both gradient descent recursions 
are consisted of two parts. 

Wnew = Wprev + gradient correction + zero attraction, (35) 

The gradient correction term is to ensure y = As, and the zero attraction term is to 
guarantee the sparsity of the solution. Taking both parts into account, the sparse solution 
can finally be extracted. The updating procedures of the two methods proposed are shown 
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Method 2. /q-EFWLMS method for CS 
1: Initiahze s(0) = 0, choose Q, A, a, k; 
2: while stop condition (|14|) is not satisfied; 
3: Determine Q input vectors x(n — Q + 1), • • • ,x(n) 
and Q desired signals d{n — Q + 1), - ■ ■ , d{n) 
For i = n — Q + 1, n 

k = mod(z,M)+l; 
x(«) = afc; 

d{i) = Vk] 
End for; 

4: Calculate error vector e'(?i) 

e'(n) = d'(n) - XTs(n - 1); 
5: Update s(n) using EFW-LMS 

s(n) = s(n — 1) + /iX(?i)Ae'(n); 
6: Impose a zero attraction 

s(n) = s(n) + Kg (s(n - 1)); 
7: Iteration number increases by one 

n = n + 1; 

8: End while. 

in Figl2l(a) and Figl2l(b). However, convergence of the recursions may be slow because the 
two parts are hard to balance. 

According to the discussions above, CS problem ([2]) is ill-conditioned and its solution 
is a — M subspace. It implies that the sparse solution can be searched iteratively in 
the solution space in order to speed up convergence. That is, the gradient correction term 
can be omitted. The updating procedures are demonstrated in Figl2j(c), where the initial 
vector of s(0) is taken as the Least Square (LS) solution, which belongs to the solution 
space. Then in iterations, only the zero attraction term is used for updating the vector. 
The updated vector is replaced by the projection of the vector on solution space as soon as 
it departs from the solution space. Particularly, suppose s(n) is the result gained after nth 
zero attraction, its projection vector in the solution space satisfy the following equation 

s(n) = argmin ||s'(n) — s(n)||2, s.t. As'(n) = y. (36) 

s'(n) 
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Method 3. Zn-ZAP method for CS 



1: Initiahze s(0) = A+y, choose a,K, 

2: while stop condition (|14p is not satisfied 

3: Update s{n) using zero attraction 

s(n) = s{n — 1) + Kg(s(n — 1)); 
4: Project s(n) on the solution space 

s(n) = s(n) + A+(y - As(n)); 
5: Iteration number increases by one 

n = n + 1; 
6: End while 




(a) Zo-LMS (b) Zo-EFWLMS (c) /o-ZAP 



Figure 2: The updating procedures of the three methods, where Sq denotes the original 
signal and s(0) denotes the initial value.(a) Zq-LMS; (b) /q-EFWLMS; (c) /q-ZAP. 

Laplacian Method can be used to solve (j36p . 

s(n) = s(?i) + A+(y- As(n)), (37) 

where A"*" = A"'"(AA^) ^ is the Pseudo-inverse matrix of Least Square. This method is 
called /o-Zero Attraction Projection (Zq-ZAP), which is summarized in Method 3. 

2.5 Discussion 

The typical performance of the three proposed methods are briefly discussed here. 

• Memory requirement: /q-LMS and Iq-EFWLMS need storage for y, A, and s, so both 
their storage requirements are about MA'" + M + A^. /q-ZAP needs additional storage 
for, at least, the Pseudoinverse matrix of Least Square, A"*". For large scale situation, 
Zq-ZAP requires about twice the memory of the other two algorithms. 
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Table 2: The computational complexity of different method in each period. 



Methods 


Multiplications 


Additions 


Times of Zero Attraction ^ 


/o-LMS 


3A/iV 


2MN 


M 


lo-EFWLMS 


(2Q + 1)MN 


(2Q + l)MN 


M 


Zo-ZAP 


2MN 


2MN + N + M 


1 



[1] Please note the computations of zero attraction is not included in the 



above multiplicaitons and additions. 

• Computational complexity: the total computational complexity depends on the num- 
ber of iterations required and the complexity of each iteration. First, the complexity 
of each iteration of these methods will be analyzed. For simplicity, the complexity 
of each period, instead of that of each iteration, will be discussed. Here, a period 
is defined as all data in matrix A has been used for one time. For example, in one 
period, (j23p is iterated M times in /q-LMS and the projection is used once in Iq-ZAP. 
For each period, the complexity of the three methods is listed in TABLE [2j It can be 
seen that 

Zo-ZAP < /o-LMS < /o-EFWLMS. (38) 

Second, the number of periods of these methods will be discussed. It is impossible 
to accurately predict the number of periods of the three proposed methods required 
to find an approximate solution. However, according to the above discussion, the 
following equation is always satisfied for the number of periods 

/o-ZAP < /o-EFWLMS < /q-LMS. (39) 

Thus, taking both (j38p and (j39p into consideration, Iq-ZAP has significantly lower 
computation complexity than /q-LMS and /q-EFWLMS. Because /q-LMS has lower 
complexity for each period but larger number of periods than /q-EEWLMS, a com- 
parison between /q-LMS and /q-EEWLMS is hard to make. 

• De-noise performance: /q-LMS and /q-EEWLMS inherit the merit of LMS algorithm 
that has good de-noise performance. For /q-ZAP, 

y = As V = A(s v) = As (40) 

where v = A+v and v is an additive noise. Thus, the iterative vector is not projected 
on the true solution set s but the solution space s with additive noise v. However, we 
have 

E{vTv}^^E{vTv}, (41) 
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where E(-) denotes the expectation. The proof of (j4ip is in Appendix A. Equation 
()4ip shows that the power of v is far smaller than that of v since M <^ N. Moreover, 
the dimension of v (e.g. M) is far smaller than that of v (e.g. N). Therefore, Iq-ZAP 
also has good de-noise performance. 

• Implementation difficulty: Iq-ZAP need two parameters a and At, while in /q-LMS and 
Iq-EFWLMS, there is another parameter /x to be chosen. Thus, Zq-ZAP is easier to 
control than the other two algorithms. 

2.6 Some Comments 

Comment 1: Besides the proposed /q-LMS and Zq-EFWLMS, the idea of zero attraction 
can be readily adopted to improve most LMS variants, e.g. Normalized LMS (NLMS), 
which may be more attractive than LMS because of its robustness. The gradient descent 
recursion of the filter coefficient vector of /q-NLMS is 

/ -.N / N e(n)x(n) , , , ^. 

w(n + 1) = w(n) + + (w(n)) , (42) 

where /3 > is the regularization parameter. These variants can also improve the perfor- 
mance in sparse signal reconstruction. 

Comment 2: Equation (jlSp is one of the multiple approximations of Iq norm. In 
fact, many other continuous functions can be used for zero attraction. For example, an 
approximation suggested by Weston et al. [l6] is 

where (5 is a small positive number. By minimizing (|43p . the corresponding zero attraction 



IS 



T 

Kg{w) = K[g{wo),g{wi),...,g{wL-i)] , (44) 

where 

5sgn{x) 

= MTW- ^''^ 

This zero attraction term can also be used in the proposed /q-LMS, /q-EFWLMS and Iq-ZAP. 



3 Convergence analysis 



In this section, we will analyse the convergence performance of /q-LMS. The steady-state 
mean square derivation between the original signal and the reconstruction signal will be 
analyzed and the bound of parameter fi to guarantee convergence will be deduced. 

Theorem ^. -Suppose that s is the original signal, and s is the reconstruction signal by 
Zq-LMS, the final mean square derivation in steady state is 



11} 



C 



(46) 
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where 



M2 

^ " 2M^-(A^ + 2)/x2' ^^^^ 
a = E{(s-s)Tg(s)}; (48) 

b = E{gT(s)g(s)}; (49) 

Po = E{i;2}, (50) 

Pq is the power of measurement noise. At the same time, in order to guarantee convergence, 
parameter fi should satisfy 

0<,.<#^. (51) 

^ N + 2 ^ ' 

The proof of the theorem is postponed to Appendix B. 

As shown in Theorem 1, the final derivation is proportional to k and the power of 
measurement noise. Thus a large k will result in a large derivation; However, a small k means 
a weak zero attraction that will induce a slower convergence. Therefore, the parameter k is 
determined by a trade-off between convergence rate and reconstruction quality in particular 
applications. 

By equation ()78p and (j79p in appendix we have the following corollary 
Corollary i.'The upper bound of derivation is 



E{||s-s||^} < C 



2k{1 - ^){N + a||s||i) + Nt^a^ + 



(52) 



The upper bound is a constant under a given signal, thus it can be regarded as a rough 
criterion to choose the parameters. 



4 Experiment Results 

The performances of the presented three methods are experimentally verified and compared 
with typical CS reconstruction algorithms BP [I], SpaRSA [25], GPSR-BB [H], H-ls [l8], 
Bregman iterative algorithm based on FPC (FPC_AS) [3l], IRLS [33] and OMP [36]. In 
the following experiments, these algorithms are tested with parameters recommended by 
respective authors. The entries of M x A'^ sensing matrix A are independently generated 
from normal distribution with mean zero and variance 1/M. The locations of K nonzero 
coefficients of sparse signal s are randomly chosen with uniform distribution [1, A^]. The 
corresponding nonzero coefficients are Gaussian with mean zero and unit variance. Finally 
the sparse signal is normalized. The measurements are generated by the following noisy 
model 

y = As + V, (53) 

where v is an additive white Gaussian noise with covariance matrix c^Im (Im is an M x M 
identity matrix). 
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The parameters in stop condition (jl4p are e = 10 ^ for all three methods, C = 10^ for 
Zo-LMS and /q-EFWLMS, C = 10^ for /q-ZAP. 

Experiment 1. Algorithm Performance: In this experiment, the performances of the 
three proposed methods in solving CS problem are tested. The parameters used for the 
signal model ([53]) are cr = 3.2 x lO'^, N = 1000, M = 200, K = 30. The parameters for the 
three methods are as follows: 

• /o-LMS: a = 10, ^ = 0.1, k = 2 x 10"^ 

• /o-EFWLMS: q = 10, ^ = 0.1, k = 2x 10~^ Q = 4, A = 0.8; 

• /o-ZAP: Q = 10, K = 5 X 10^"^. 

The original signal and the estimation results obtained with /q-LMS, /q-EFWLMS, and 
Zq-ZAP are shown in Figl3j It can be seen that all three proposed methods reconstruct the 
original signal. The convergence curves of the three methods are demonstrated in FigHl 
where MSD denotes Mean Square Derivation. For /q-LMS and /q-EFWLMS, all data of 
matrix A is used once in each iteration (Please note that the stop condition is not used 
here). As can be seen in FigHl Zq-EFWLMS has the smallest MSD after convergence and 
Zq-ZAP achieves the fastest convergence with sacrifice in reconstruction quality. 

To compare with the other algorithms, CPU time is used as an index of complexity, 
although it gives only a rough estimation of complexity. Our simulations are performed in 
MATLAB 7.4 environment using an Intel T8300, 2.4GIIz processor with 2GB of memory, 
and under Microsoft Windows XP operating system. The final average CPU time (of total 
10 times, in seconds) and MSD are listed in TABLE [3l Here, the parameter in IRLS 
is p = 1/2. It can be seen that the proposed three methods have the least MSD. In 
addition, /q-ZAP is fastest among listed algorithms, though /q-LMS and Zq-EFWLMS have 
no significant advantage over the other algorithms. 

Experiment 2. Effect of Sparsity on the performance: This experiment explores the 
answer to this question: with the proposed methods, how sparse a source vector s should 
be to make its estimation possible under given number of measurements. The parameters are 
the same as the first experiment except that the noise variance is zero. Different sparsities 
(i.e. K) are chosen from 10 to 80. For each K, 200 simulations are conducted to calculate 
the probability of exact reconstruction in different algorithms. The results for all seven 
algorithms are demonstrated in FigJSJ As can be seen, performances of the three proposed 
methods far exceed those of the other algorithms. While all the other algorithms fail when 
sparsity K is larger than 40, the three methods proposed succeed until sparsity K reaches 
45. In addition, the proposed three methods have similar good performances. 

Experiment 3. Effect of number of measurements on the performance: This experiment 
is to investigate the probability of exact recovery when given different numbers of measure- 
ments and a fixed signal sparsity K = 50. The same setups of the first experiment is used 
except that the noise variance is zero. Different numbers of measurements M are chosen 
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Figure 3: Reconstruction result of the three proposed methods. 




Table 3: The CPU time and MSP. 



algorithms 


average CPU time (in sec) 


MoJJ 


Dr 


U.ooz 


i.i X iU 


(J MP 


r\ r\r\ a 

U.U94 


7.18 X 10 




i.odb 




ILJs 


LA6b 


7.68 X ID 


opaKlbA 


U.zzi 


7.25 X 1(J 


GPSR-BB 


0.266 


7.43 X 10-2 


FPC-AS 


0.086 


7.38 X 10-2 


/o-LMS 


1.152 


3.33 X lO-'^ 


Zo-EFWLMS 


1.544 


2.44 X 10-4 


Zo-ZAP 


0.068 


2.25 X 10-3 



from 140 to 320. All these algorithms are repeated 200 times for each value of M, and 
the probability curves are shown in Figj6j Again, it can be seen that the three proposed 
methods have the best performances. While all other algorithms fail when the measurement 
number M is lower than 230, the three proposed methods can still reconstruct exactly the 
original signal until M reaches 220. Meanwhile, the proposed algorithms have comparable 
good performances. 

Experiment 4- Robustness against noise: The fourth experiment is to test the effect 
of signal-to-noise ratio (SNR) on reconstruction performance, where SNR is defined as 
SNR = 10 log II AsIIj/IIvIIj. The parameters are the same as the first experiment and SNR 
is chosen from 4dB to 32dB. For each SNR, all these algorithms are repeated 200 times to 
calculate the MSD. FiglT] shows that the three new methods have better performances than 
the other traditional algorithms in all SNR. With the same SNR, the proposed algorithms 
can acquire small MSDs. In addition, the /q-EFWLMS has the smallest MSD and /q-ZAP 
has the largest MSD in the three new methods. Obviously, the above results are consistent 
with discussions in previous sections. 

Experiment 5. Effect of parameter on the performance of Iq-LMS: In this experi- 
ment, the condition (j5ip on step-size to guarantee the convergence of Zq-LMS will be ver- 
ified. The setups of this experiment are the same as the first experiment except that 
M = {200,250,300,350,400}. For each M, 100 simulations are conducted to calculate the 
probability of exact reconstruction using /q-LMS with the parameters a = 10, k = 10"^ and 
different step-sizes (from 0.3 to 1.1). FiglH] demonstrates that exact reconstruction cannot 
be achieved at about fj. = {0.4, 0.5, 0.6, 0.7, 0.8} with respective M values, which are consis- 
tent with the values Umax calculated by condition (|51|) . This result verifies our derivation 
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Figure 5: The probability of exact reconstruction versus sparsity K. 
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;ure 6: Tlie probability of exact reconstruction versus measurement number M. 
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Figure 7: The reconstruction MSD versus SNR. 



in Theorem 1. 



5 Conclusion 

The adaptive filtering framework is introduced at the first time to solve CS problem. Two 
typical adaptive filtering algorithms /q-LMS and /q-EFWLMS, both imposing zero attrac- 
tion method, are introduced to solve CS problem, as well as to verify our framework. In 
order to speed up the convergence of the two methods, a novel algorithm Iq-ZAP, which 
adopts the zero attraction method in the solution space, is further proposed. Thus the 
mean square derivation of /q-LMS in steady state has been deduced. The performances 
of these methods have been studied experimentally. Compared with those existing typical 
algorithms, they can reconstruct signal with more nonzero coefficients under a certain given 
number of measurements; while under a given sparsity, fewer measurements are required by 
these algorithms. Moreover, they are more robust against noise. 

Up to now, there is no theoretical result for determining how to choose the parameters 
of the proposed algorithms and how much the number of measurements M is in the context 
of RIP. These remain open problems for our future work. In addition, our future work 
includes the detailed discussion about the convergence performances of /q-EFWLMS and 
/o-ZAR 
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Figure 8: The probability of exact reconstruction of /q-LMS versus fi with different M. 

Appendix A Proof of ( HTI ) 

Proof The power of v is 

E{vTv} = E{vT(A+fA+v} 

= E I [aT(AAT)"^] [aT(AAT)-^] v| 
= E{v(AA^)-V} 

= E{vE{(AAT)-1} v} . (54) 



where the reason of the last equation of (j54p holding is that the noise v and measurement 
matrix A are independent. 
Suppose 

A = {0'ij)i<i<M,l<j<N 

As mentioned in Section I, aij is i.i. d. with Af{0, jj). Let 

B = A AT = (bij), 



Mj )l<i<M,l<j<M, 



(55) 
(56) 



Thus, for the diagonal components, 



N 



bii = ^al,l<i<M. 



k=l 

Since N is very large in CS, according to the central limit theorem 
equation holds approximately, 

/ N 2N\ 

bu^Af{Ebu,Dbu)=Mi—,j^j 



(57) 

the following 
(58) 



21 



where D{-} denotes the variance. Similarly, for the non-diagonal components, 

N 



h, - N (E6ij , ) = ( 0, ^ ) , i j. (59) 



Because N/M > 2N /M\ we have 

AA-»^L (00) 

Thus 

(AAT)-i « ^I. (61) 



Therefore equation ()54p can be simplified as 



E{v^v}«^E{vTv} (62) 



Appendix B Proof of Theorem 1 

Proof For simplicity, we use w(?i), x(n), and d{n) instead of s(A;), and y^, respectively. 
Suppose that Wq is the Wiener solution, thus 

d{n) = x"^(n)wo + v{n), (63) 

where v{n) is the measurement noise with zero mean. Define the misalignment vector as 

h(n) = w(n) — Wq. (64) 

Thus we have 

e(n) = v{n) — x'^(n)h(n). (65) 
Equation (I23p is equivalent to 

h(n + 1) = [l — /_fx(n)x"'"(n)] h(n) + Kg(w(?i)) + /it;(n)x(n) (66) 

Postmultiplying both sides of (j66p with their respective transposes, 

h(n + l)h'^(n + 1) = [l - ^x(n)x'^(n)] h(n)h'^(n) [l - /ix(n)x'^(n)] ""^ 

+ [l — /ix(n)x'^(n)] h(n)Kg(w(n)) 
+fj,v{n) [l — /ix(n)x^(n)] h(n)x^(n) 
+Kg(w(n))h'^(n) [l — /ix(n)x'^(n)] 
+K2g(w(n))g'^(w(n)) 
+K/i?;(n)g(w(n))x'^(n) 

(n)x(n)h'^(n) [l — /ix(7i)x'^(n)] 
(n)x(n)g'^(w(n)) 
+^t)^(n)x(n)x'^(n). (67) 
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Let 



K(n) = E{h(n)h'^(n)} 



(68) 



denote a second moment matrix of the coefficient misaUgnment vector. Taking expectations 
on both sides of (j67p and using the Independence Assumption [38], there is 



K(n + 1) =K(n) - ^ (RK(n) + K(n)R) + 2/i2RK(n)R 

+ /i^Rtr (RK(n)) + 2(1 - /^R)kE {h(n)g^(w(n))} 
+ K^Elg {w{n)) (w(?i))} + /i^PoR, 



where 



is the input correlation matrix, 



R = E{x(n)xT(n)} 



Po = E{v\n)} 



(69) 

(70) 
(71) 



is the minimum mean-squared estimation error and tr{-} denotes the trace. 

As mentioned in Section I, A is i.i.d. Gaussian with mean zero and variance 1/M. Then 



M 

Therefore equation (j69p can be simphfied as 



(72) 



K(n + 1) = (1 - ^ + ^)K(n) + ^tr{K{n)}l + 2(1 - ^)kE {h(n)gT(w(n))} 



M M2' 



M2 



M' 



+/^2E{g(w(n))gT(w(n))} + ^Pol- 



Let 



D{n) =E{||w(n) - Wo\\l} =tr{K(n)}. 
Take the trace on both side of ([75]) . 



D{n + 1) 
where 



^_ 2^ (iV + 2)^ 
M M2 



D{n) + 2(1 - ^)Ka(n) + K^fi{n) + ^Pq, 



a(n) = E {h^ {w{n))g{'w{n))] ; 



(73) 
■ 

(74) 

(75) 
(76) 



/3(n) = E{gT(w(n))g(w(n))} . 



(77) 
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Note that both a(n) and /3(n) are bounded, 



\a{n)\ = |E{(w(n) - Wo)g(w(?i))}| 
<E|{(w(n)-Wo)g(w(?i))}| 



N-l 

- X] ^ [{(""^iH - Woi)9{wi{n))}\ 

1=0 

= ^ E\{wi{n) - Woi)g{wi{n))\ 
k»(")l<^ 

< Yl ^{\iwiin)-Woi)\\g{w,in))\} 
k.(")l<^ 

< ^ aE {\{wi{n) - Woi)\} {■ / \g{wi{n))\ < a) 
k.(")l<^ 

< a {E\wi{n)\ + 

< N + a||wo||i; 
|/3(n)| = |E{gT(w(n))g(w(n))}| 

<E{|gT(w(n))g(w(n))|} 

N~l 

< ^E{bK(n))n 
i=0 

< iVa^ (79) 
Therefore the foUowing equation should be satisfied to guarantee convergence of (|75p . 

(80) 



(78) 



2fi {N + 2)f,^ 



We have 



< /U < 



2M 



+ 2 

The final mean square derivation in steady state is 



D{oo) = C 



2k(1- A)a(oo)+K2^(^) + ^P0 



where 



C 



M2 



2nM - {N + 2)^"^' 
a(oo) =E{hT(w(oo))g(w(oo))} ; 
/3(oo) = E{gT(w(cx)))g(w(cx)))}. 



(81) 



(82) 



(83) 

(84) 
(85) 
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